Dynamic Conditional Random Fields for Joint Sentence Boundary and Punctuation Prediction

نویسندگان

  • Xuancong Wang
  • Hwee Tou Ng
  • Khe Chai Sim
چکیده

The use of dynamic conditional random fields (DCRF) has been shown to outperform linear-chain conditional random fields (LCRF) for punctuation prediction on conversational speech texts [1]. In this paper, we combine lexical, prosodic, and modified n-gram score features into the DCRF framework for a joint sentence boundary and punctuation prediction task on TDT3 English broadcast news. We show that the joint prediction method outperforms the conventional two-stage method using LCRF or maximum entropy model (MaxEnt). We show the importance of various features using DCRF, LCRF, MaxEnt, and hidden-event n-gram model (HEN) respectively. In addition, we address the practical issue of feature explosion by introducing lexical pruning, which reduces model size and improves the F1-measure. We adopt incremental local training to overcome memory size limitation without incurring significant performance penalty. Our results show that adding prosodic and n-gram score features gives about 20% relative error reduction in all cases. Overall, DCRF gives the best accuracy, followed by LCRF, MaxEnt, and HEN.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Better Punctuation Prediction with Dynamic Conditional Random Fields

This paper focuses on the task of inserting punctuation symbols into transcribed conversational speech texts, without relying on prosodic cues. We investigate limitations associated with previous methods, and propose a novel approach based on dynamic conditional random fields. Different from previous work, our proposed approach is designed to jointly perform both sentence boundary and sentence ...

متن کامل

Punctuation Prediction using Linear Chain Conditional Random Fields

We investigate the task of punctuation prediction in English sentences without prosodic information. In our approach, stochastic gradient ascent (SGA) is used to maximize log conditional likelihood when learning the parameters of linear-chain conditional random fields. For SGA, two different approximation techniques, namely Collins perceptron and contrastive divergence, are used to estimate the...

متن کامل

Improved models for automatic punctuation prediction for spoken and written text

This paper presents improved models for the automatic prediction of punctuation marks in written or spoken text. Various kinds of textual features are combined using Conditional Random Fields. These features include language model scores, token n-grams, sentence length, and syntactic information extracted from parse trees. The resulting models are evaluated on several different tasks, ranging f...

متن کامل

A CRF Sequence Labeling Approach to Chinese Punctuation Prediction

This paper presents a conditional random fields based labeling approach to Chinese punctuation prediction. To this end, we first reformulate Chinese punctuation prediction as a multiple-pass labeling task on a sequence of words, and then explore various features from three linguistic levels, namely words, phrase and functional chunks for punctuation prediction under the framework of conditional...

متن کامل

Combination of NN and CRF models for joint detection of punctuation and disfluencies

Inserting proper punctuation marks and deleting speech disfluencies are two of the most essential tasks in spoken language processing. This challenging task has prompted extensive research using various techniques, such as conditional random fields. Neural networks, however, are relatively under-explored for this task. Combining different modeling techniques with different advantages has the po...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012